Manufacture and expression of large structural genes

ABSTRACT

Illustrated is the preparation and expression of manufactured genes capable of directing synthesis of human immune and leukocyte interferons and of other biologically active proteinaceous products, which products differ from naturally-occurring forms in terms of the identity and/or relative position of one or more amino acids, and in terms of one or more biological and pharmacological properties but which substantially retain other such properties.

This a continuation-in-part of U.S. patent application Ser. No.06/375,494, filed May 6, 1982 and now abandoned.

The present invention relates generally to the manipulation of geneticmaterials and, more particularly, to the manufacture of specific DNAsequences useful in recombinant procedures to secure the production ofproteins of interest.

Genetic materials may be broadly defined as those chemical substanceswhich program for and guide the manufacture of constituents of cells andviruses and direct the responses of cells and viruses. A long chainpolymeric substance known as deoxyribonucleic acid (DNA) comprises thegenetic material of all living cells and viruses except for certainviruses which are programmed by ribonucleic acids (RNA). The repeatingunits in DNA polymers are four different nucleotides, each of whichconsists of either a purine (adenine or guanine) or a pyrimidine(thymine or cytosine) bound to a deoxyribose sugar to which a phosphategroup is attached. Attachment of nucleotides in linear polymeric form isby means of fusion of the 5′ phosphate of one nucleotide to the 3′hydroxyl group of another. Functional DNA occurs in the form of stabledouble stranded associations of single strands of nucleotides (known asdeoxyoligonucleotides), which associations occur by means of hydrogenbonding between purine and pyrimidine bases [i.e., “complementary”associations existing either between adenine (A) and thymine (T) orguanine (G) and cytosine (C)]. By convention, nucleotides are referredto by the names of their constituent purine or pyrimidine bases, and thecomplementary associations of nucleotides in double stranded DNA (i.e.,A-T and G-C) are referred to as “base pairs”. Ribonucleic acid is apolynucleotide comprising adenine, guanine, cytosine and uracil (U),rather than thymine, bound to ribose and a phosphate group.

Most briefly put, the programming function of DNA is generally effectedthrough a process wherein specific DNA nucleotide sequences (genes) are“transcribed” into relatively unstable messenger RNA (mRNA) polymers.The mRNA, in turn, serves as a template for the formation of structural,regulatory and catalytic proteins from amino acids. This translationprocess involves the operations of small RNA strands (tRNA) whichtransport and align individual amino acids along the mRNA strand toallow for formation of polypeptides in proper amino acid sequences. ThemRNA “message”, derived from DNA and providing the basis for the tRNAsupply and orientation of any given one of the twenty amino acids forpolypeptide “expression”, is in the form of triplet “condons”—sequentialgroupings of three nucleotide bases. In one sense, the formation of aprotein is the ultimate form of “expression” of the programmed geneticmessage provided by the nucleotide sequence of a gene.

Certain DNA sequences which usually “precede” a gene in a DNA polymerprovide a site for initiation of the transcription into mRNA. These arereferred to as “promoter” sequences. Other DNA sequences, also usually“upstream” of (i.e., preceding) a gene in a given DNA polymer, bindproteins that determine the frequency (or rate) of transcriptioninitiation. These other sequences are referred to as “regulator”sequences. Thus, sequences which precede a selected gene (or series ofgenes) in a functional DNA polymer and which operate to determinewhether the transcription (and eventual expression) of a gene will takeplace are collectively referred to as “promoter/regulator” or “control”DNA sequences. DNA sequences which “follow” a gene in a DNA polymer andprovide a signal for termination of the transcription into mRNA arereferred to as “terminator” sequences.

A focus of microbiological processing for nearly the last decade hasbeen the attempt to manufacture industrially and pharmaceuticallysignificant substances using organisms which do not initially havegenetically coded information concerning the desired product included intheir DNA. Simply put, a gene that specifies the structure of a productis either isolated from a “donor” organism or chemically synthesized andthen stably introduced into another organism which is preferably aself-replicating unicellular microorganism. Once this is done, theexisting machinery for gene expression in the “transformed” host cellsoperates to construct the desired product.

The art is rich in patent and literature publications relating to“recombinant DNA” methodologies for the isolation, synthesis,purification and amplification of genetic materials for use in thetransformation of selected host organisms. U.S. Pat. No. 4,237,224 toCohen, et al., for example, relates to transformation of procaryoticunicellular host organisms with “hybrid” viral or circular plasmid DNAwhich includes selected exogenous DNA sequences. The procedures of theCohen, et al. patent first involve manufacture of a transformationvector by enzymatically cleaving viral or circular plasmid DNA to formlinear DNA strands. Selected foreign DNA strands are also prepared inlinear form through use of similar enzymes. The linear viral or plasmidDNA is incubated with the foreign DNA in the presence of ligatingenzymes capable of effecting a restoration process and “hybrid” vectorsare formed which include the selected foreign DNA segment “spliced” intothe viral or circular DNA plasmid.

Transformation of compatible unicellular host organisms with the hybridvector results in the formation of multiple copies of the foreign DNA inthe host cell population. In some instances, the desired result issimply the amplification of the foreign DNA and the “product” harvestedis DNA. More frequently, the goal of transformation is the expression bythe host cells of the foreign DNA in the form of large scale synthesisof isolatable quantities of commercially significant protein orpolypeptide fragments coded for by the foreign DNA. See also, e.g., U.S.Pat. No. 4,269,731 (to Shine), U.S. Pat. No. 4,273,875 (to Manis) andU.S. Pat. No. 4,293,652 (to Cohen).

The success of procedures such as described in the Cohen, et al. patentis due in large part to the ready availability of “restrictionendonuclease” enyzmes which facilitate the site-specific cleavage ofboth the unhybridized DNA vector and, e.g., eukaryotic DNA strandscontaining the foreign sequences of interest. Cleavage in a mannerproviding for the formation of single stranded complementary “ends” onthe double stranded linear DNA strands greatly enhances the likelihoodof functional incorporation of the foreign DNA into the vector upon“ligating” enzyme treatment. A large number of such restrictionendonuclease enzymes are currently commercially available [See, e.g.,“BRL Restriction Endonuclease Reference Chart” appearing in the “'81/'82Catalog” of Bethesda Research Laboratories, Inc., Gaithersburg, Md.]Verification of hybrid formation is facilitated by chromatographictechniques which can, for example, distinguish the hybrid plasmids fromnon-hybrids on the basis of molecular weight. Other useful verificationtechniques involve radioactive DNA hybridization.

Another manipulative “tool” largely responsible for successes intransformation of procaryotic cells is the use of selected “marker” genesequences. Briefly put, hybrid vectors are employed which contain, inaddition to the desired foreign DNA, one or more DNA sequences whichcode for expression of a phenotype trait capable of distinguishingtransformed from non-transformed host cells. Typical marker genesequences are those which allow a transformed procaryotic cell tosurvive and propagate in a culture medium containing metals,antibiotics, and like components which would kill or severely inhibitpropagation of non-transformed host cells.

Successful expression of an exogenous gene in a transformed hostmicroorganism depends to a great extent on incorporation of the geneinto a transformation vector with a suitable promoter/regulator regionpresent to insure transcription of the gene into mRNA and other signalswhich insure translation of the mRNA message into protein (e.g.,ribosome binding sites).

It is not often the case that the “original” promoter/regulator regionof a gene will allow for high levels of expression in the new host.Consequently, the gene to be inserted must either be fitted with a new,host-accommodated transcription and translation regulating DNA sequenceprior to insertion or it must be inserted at a site where it will comeunder the control of existing transcription and translation signals inthe vector DNA.

It is frequently the case that the insertion of an exogenous gene into,e.g., a circular DNA plasmid vector, is performed at a site eitherimmediately following an extant transcription and translation signal orwithin an existing plasmid-borne gene coding for a rather large proteinwhich is the subject of high degrees of expression in the host. In thelatter case, the host's expression of the “fusion gene” so formedresults in high levels of production of a “fusion protein” including thedesired protein sequence (e.g., as an intermediate segment which can beisolated by chemical cleavage of large protein). Such procedures notonly insure desired regulation and high levels of expression of theexogenous gene product but also result in a degree of protection of thedesired protein product from attack by proteases endogenous to the host.Further, depending on the host organisms, such procedures may allow fora kind of “piggyback” transportation of the desired protein from thehost cells into the cell culture medium, eliminating the need to destroyhost cells for the purpose of isolating the desired product.

While the foregoing generalized descriptions of published recombinantDNA methodologies may make the processes appear to be ratherstraightforward, easily performed and readily verified, it is actuallythe case that the DNA sequence manipulations involved are quitepainstakingly difficult to perform and almost invariably characterizedby very low yields of desired products.

As an example, the initial “preparation” of a gene for insertion into avector to be used in transformation of a host microorganism can be anenormously difficult process, especially where the gene to be expressedis endogenous to a higher organism such as man. One laborious procedurepracticed in the art is the systematic cloning into recombinant plasmidsof the total DNA genome of the “donor” cells, generating immense“libraries” of transformed cells carrying random DNA sequence fragmentswhich must be individually tested for expression of a product ofinterest. According to another procedure, total mRNA is isolated fromhigh expression donor cells (presumptively containing multiple copies ofmRNA coded for the product of interest), first “copied” into singlestranded cDNA with reverse transcriptase enzymes, then into doublestranded form with polymerase, and closed. The procedure again generatesa library of transformed cells somewhat smaller than a total genomelibrary which may include the desired gene copies free ofnon-transcribed “introns” which can significantly interfere withexpression by a host microorganism. The above-noted time-consuming geneisolation procedures were in fact employed in published recombinant DNAprocedures for obtaining microorganism expression of several proteins,including rat proinsulin [Ullrich, et al., Science, 196, pp. 1313-1318(1977)], human fibroblast interferon [Goedell, et al., Nucleic AcidsResearch, 8, pp. 4087-4094 (1980)], mouse B-endorphin [Shine, et al.,Nature, 285, pp. 456-461 (1980)] and human leukocyte interferon[Goedell, et al., Nature, 287, pp. 411-416 (1980); and Goedell, et al.,Nature, 290, pp. 20-26 (1981)].

Whenever possible, the partial or total manufacture of genes of interestfrom nucleotide bases constitutes a much preferred procedure forpreparation of genes to be used in recombinant DNA methods. Arequirement for such manufacture is, of course, knowledge of the correctamino acid sequence of the desired polypeptide. With this information inhand, a generative DNA sequence code for the protein (i.e., a properlyordered series of base triplet codons) can be planned and acorresponding synthetic, double stranded DNA segment can be constructed.A combination of manufacturing and cDNA synthetic methodologies isreported to have been employed in the generation of a gene for humangrowth hormone. Specifically, a manufactured linear double stranded DNAsequence of 72 nucleotide base pairs (comprising codons specifying thefirst 24 amino acids of the desired 191 amino acid polypeptide) wasligated to a cDNA-derived double strand coding for amino acids Nos.25-191 and inserted in a modified pBR322 plasmid at a locus controlledby a lac promoter/regulator sequence [Goedell, et al., Nature, 281, pp.544-548 (1981)].

Completely synthetic procedures have been employed for the manufactureof genes coding for relatively “short” biologically functionalpolypeptides, such as human somatostatin (14 amino acids) and humaninsulin (2 polypeptide chains of 21 and 30 amino acids, respectively).

In the somatostatin gene preparative procedure [Itakura, et al.,Science, 198, pp. 1056-1063 (1977)] a 52 base pair gene was constructedwherein 42 base pairs represented the codons specifying the required 14amino acids and an additional 10 base pairs were added to permitformation of “sticky-end” single stranded terminal regions employed forligating the structural gene into a microorganism transformation vector.

Specifically, the gene was inserted close to the end of aβ-galactosidase enzyme gene and the resultant fusion gene was expressedas a fusion protein from which somatostatin was isolated by cyanogenbromide cleavage. Manufacture of the human insulin gene, as noted above,involved preparation of genes coding for a 21 amino acid chain and for a30 amino acid chain. Eighteen deoxyoligonucleotide fragments werecombined to make the gene for the longer chain, and eleven fragmentswere joined into a gene for the shorter chain. Each gene was employed toform a fusion gene with a β-galactosidase gene and the individuallyexpressed polypeptide chains were enzymatically isolated and linked toform complete insulin molecules. [Goedell, et al., Proc. Nat. Acad. Sci.U.S.A., 76, pp. 106-110 (1979).]

In each of the above procedures, deoxyoligonucleotide segments wereprepared, and then sequentially ligated according to the followinggeneral procedure. [See, e.g., Agarwal, et al., Nature, 227, pp. 1-7(1970) and Khorana, Science, 203, pp. 614-675 (1979)].

An initial “top” (i.e.,5′-3′ polarity) deoxyoligonucleotide segment isenzymatically joined to a second “top” segment. Alignment of these two“top” strands is made possible using a “bottom” (i.e., 3′ to 5′polarity) strand having a base sequence complementary to half of thefirst top strand and half of the second top strand. After joining, theuncompletemented bases of the top strands “protrude” from the duplexportion formed. A second bottom strand is added which includes the fiveor six base complement of a protruding top strand, plus an additionalfive or six bases which then protrude as a bottom single strandedportion. The two bottom strands are then joined. Such sequentialadditions are continued until a complete gene sequence is developed,with the total procedure being very time-consuming and highlyinefficient.

The time-consuming characteristics of such methods for total genesynthesis are exemplified by reports that three months' work by at leastfour investigators was needed to perform the assembly of the two“short”, insulin genes previously referred to. Further, while onlyrelatively small quantities of any manufactured gene are needed forsuccess of vector insertion, the above synthetic procedures have suchpoor overall yields (on the order of 20% per ligation) that the eventualisolation of even minute quantities of a selected short gene is by nomeans guaranteed with even the most scrupulous adherence to prescribedmethods. The maximum length gene which can be synthesized is clearlylimited by the efficiency with which the individual short segments canbe joined. If n such ligation reactions are required and the yield ofeach such reaction is y, the quantity of correctly synthesized geneticmaterial obtained will be proportional to y″. Since this relationship isexpotential in nature, even a small increase in the yield per ligationreaction will result in a substantial increase in the length of thelargest gene that may be synthesized.

Inefficiencies in the above-noted methodology are due in large part tothe formation of undesired intermediate products. As an example, in aninitial reaction forming annealed top strands associated with a bottom,“template” strand, the desired reaction may be,

but the actual products obtained may be

or the like. Further, the longer the individual deoxyolignucleotidesare, the more likely it is that they will form thermodynamically stableself-associations such as “hair-pins” or aggregations.

Proposals for increasing synthetic efficiency have not been forthcomingand it was recently reported that, “With the methods now available,however, it is not economically practical to synthesize genes forpeptides longer than about 30 amino acid units, and many clinicallyimportant proteins are much longer”. [Aharonowitz, et al., ScientificAmerican, 245, No. 3, pp. 140-152, at p. 151 (1981).]

An illustration of the “economic practicalities” involved in large genesynthesis is provided by the recent publication of “successful” effortsin the total synthesis of a human leukocyte interferon gene [Edge, etal., Nature, 292, pp. 756-782 (1981).] Briefly summarized, 67 differentdeoxyoligonucleotides containing about 15 bases were synthesized andjoined in the “50 percent overlap” procedure of the type noted above toform eleven short duplexes. These, in turn were assembled into fourlonger duplexes which were eventually joined to provide a 514 base pairgene coding for the 166 amino acid protein. The procedure, which theauthors characterize as “rapid”, is reliably estimated to have consumednearly a year's effort by five workers and the efficiency of theassembly strategy was clearly quite poor. It may be noted, for example,that while 40 pmole of each of the starting 67 deoxyoligonucleotides wasprepared and employed to form the eleven intermediate-sized duplexes, bythe time assembly of the four large duplexes was achieved, a yield ofonly about 0.01 pmole of the longer duplexes could be obtained for usein final assembly of the whole gene.

Another aspect of the practice of recombinant DNA techniques for theexpression, by microorganisms, of proteins of industrial andpharmaceutical interest is the phenomenon of “codon preference”. Whileit was earlier noted that the existing machinery for gene expression ingenetically transformed host cells will “operate” to construct a givendesired product, levels of expression attained in a microorganism can besubject to wide variation, depending in part on specific alternativeforms of the amino acid-specifying genetic code present in an insertedexogenous gene.A “triplet” codon of four possible nucleotide bases canexist in 64 variant forms. That these forms provide the message for only20 different amino acids (as well as transcription initiation andtermination) means that some amino acids can be coded for by more thanone codon. Indeed, some amino acids have as many as six “redundant”,alternative codons while some others have a single, required codon. Forreasons not completely understood, alternative codons are not at alluniformly present in the endogenous DNA of differing types of cells andthere appears to exist a variable natural hierarchy or preference forcertain codons in certain types of cells.

As one example, the amino acid leucine is specified by any of six DNAcodons including CTA, CTC, CTG, CTT, TTA, and TTG (which correspond,respectively, to the mRNA codons, CUA, CUC, CUG, CUU, UUA and UUG).Exhaustive analysis of genome codon frequencies for microorganisms hasrevealed endogenous DNA of E. coli bacteria most commonly contains theCTG leucine-specifying codon, while the DNA of yeasts and slime moldsmost commonly includes a TTA leucinespecifying codon. In view of thishierarchy, it is generally held that the likelihood of obtaining highlevels of expression of a leucine-rich polypeptide by an E. coli hostwill depend to some extent on the frequency of codon use. For example, agene rich in TTA codons will in all probability be poorly expressed inE. coli, whereas a CTG rich gene will probably highly express thepolypeptide. In a like manner, when yeast cells are the projectedtransformation host cells for expression of a leucine-rich polypeptide,a preferred codon for use in an inserted DNA would be TTA. See, e.g.,Grantham, et al. Nucleic Acids Research, 8, pp. r49-62 (1980); Grantham,et al., Nucleic Acids Research, 8, pp. 1893-1912 (1980); and, Grantham,et al., Nucleic Acids Research, 9, pp. r43-74 (1981).

The implications of codon preference phenomena on recombinant DNAtechniques are manifest, and the phenomenon may serve to explain manyprior failures to achieve high expression levels for exogenous genes insuccessfully transformed host organisms—a less “preferred” codon may berepeatedly present in the inserted gene and the host cell machinery forexpression may not operate as efficiently. This phenomenon directs theconclusion that wholly manufactured genes which have been designed toinclude a projected host cell's preferred codons provide a preferredform of foreign genetic material for practice of recombinant DNAtechniques. In this context, the absence of procedures for rapid andefficient total gene manufacture which would permit codon selection isseen to constitute an even more serious roadblock to advances in theart.

Of substantial interest to the background of the present invention isthe state of the art with regard to the preparation and use of a classof biologically active substances, the interferons (IFNs). Interferonsare secreted proteins having fairly well-defined antiviral, antitumorand immunomodulatory characteristics. See, e.g., Gray, et al., Nature,295, pp. 503-508 (1982) and Edge, et al., supra, and references citestherein.

On the basis of antigenicity and biological and chemical properties,human interferons have been grouped into three major classes: IFN-α(leukocyte), IFN-β (fibroblast) and IFN-γ (immune). Considerableinformation has accumulated on the structures and properties of thevirus-induced acid-stable interferons (IFN-αand β). These have beenpurified to homogeneity and at least partial amino acid sequences havebeen determined. Analyses of cloned cDNA and gene sequences for IFN-β₁and the IFN-α multigene family have permitted the deduction of thecomplete amino acid sequences of many of the interferons. In addition,efficient synthesis of IFN-β₁ and several IFN-as in E. coli, and IFN-a₁,in yeast, have now made possible the purification of large quantities ofthese proteins in biologically active form.

Much less information is available concerning the structure andproperties of IFN-γ, an interferon generally produced in cultures oflymphocytes exposed to various mitogenic stimuli. It is acid labile anddoes not cross-react with antisera prepared against IFN-α or IFN-β. Abroad range of biological activities have been attributed to IFN-γincluding potentiation of the antiviral activities of IFN-α and β, fromwhich it differs in terms of its virus and cell specificities and theantiviral mechanisms induced. In vitro studies performed with crudepreparations suggest that the primary function of IFN-γ may be as animmunoregulatory agent. The antiproliferative effect of IFN—γ ontransformed cells has been reported to be 10 to 100-fold greater thanthat of IFN-α or β, suggesting a potential use in the treatment ofneoplasia. Murine IFN-γ preparations have been shown to have significantantitumor activity against mouse sarcomas.

It has recently been reported (Gray, et al., supra) that a recombinantplasmid containing a cDNA sequence coding for human IFN-γ has beenisolated and characterized. Expression of this sequence in E. coli andcultured monkey cells is reported to give rise to a polypeptide havingthe properties of authentic human IFN-γ. In the publication, the cDNAsequence and the deduced 146 amino acid sequence of the “mature”polypeptide, exclusive of the putative leader sequence, is as follows:

1                                   10Cys-Tyr-Cys-Gln-Asp-Pro-Tyr-Val-Lys-Glu-Ala-Glu- TGT TAC TGC CAG CAG CAATAT GTA AAA GAA GCA GAA                             20Asn-Leu-Lys-Lys-Tyr-Phe-Asn-Ala-Gly-His-Ser-Asp- AAC CTT AAG AAA TAT TTTAAT GCA GGT CAT TCA GAT                     30Val-Ala-Asp-Asn-Gly-Thr-Leu-Phe-Leu-Gly-Ile-Leu- GTA GCG GAT AAT GGA ACTCTT TTC TTA GGC ATT TTG             40                                  Lys-Asn-Trp-Lys-Glu-Glu-Ser-Asp-Arg-Lys-Ile-Met- AAG AAT TGG AAA GAG GAGAGT GAC AGA AAA ATA ATG     50                                      60  Gln-Ser-Gln-Ile-Val-Ser-Phe-Tyr-Phe-Lys-Leu-Phe- CAG AGC CAA ATT GTC TCCTTT TAC TTC AAA CTT TTT                                     70Lys-Asn-Phe-Lys-Asp-Asp-Gln-Ser-Ile-Gln-Lys-Ser- AAA AAC TTT AAA GAT GACCAG AGC ATC CAA AAG AGT                             80Val-Glu-Thr-Ile-Lys-Glu-Asp-Met-Asn-Val-Lys-Phe- GTG GAG ACC ATC AAG GAAGAC ATG AAT GTC AAG TTT                     90Phe-Asn-Ser-Asn-Lys-Lys-Lys-Arg-Asp-Asp-Phe-Glu- TTC AAT AGC AAC AAA AAGAAA CGA GAT GAC TTC GAA             100Lys-Leu-Thr-Asn-Tyr-Ser-Val-Thr-Asp-Leu-Asn-Val- AAG CTG ACT AAT TAT TCGGTA ACT GAC TTG AAT GTC     110                                     120Gln-Arg-Lys-Ala-Ile-His-Glu-Leu-Ile-Gln-Val-Met- CAA CGC AAA GCA ATA CATGAA CTC CTC ATC CAA ATG                                     130Ala-Glu-Leu-Ser-Pro-Ala-Ala-Lys-Thr-Gly-Lys-Arg- GCT GAA CTG TCG CAA GCAGCT AAA ACA GGG AAG CGA                             140Lys-Arg-Ser-Gln-Met-Leu-Phe-Gln-Gly-Arg-Arg-Ala- AAA AGG AGT CAG ATG CTGTTT CAA GGT CGA AGA GCA     146 Ser-Gln TCC CAG.

In a previous publication of the sequence, arginine, rather thanglutamine, was specified at position 140 in the sequence. (Unlessotherwise indicated, therefore, reference to “human immune interferon”or, simply “IFN-γ” shall comprehend both the [Arg¹⁴⁰] and [Gln ¹⁴⁰]forms.)

The above-noted wide variations in biological activities of variousinterferon types makes the construction of synthetic polypeptide analogsof the interferons of paramount significance to the full development ofthe therapeutic potential of this class of compounds. Despite theadvantages in isolation of quantities of interferons which have beenprovided by recombinant DNA techniques to date, practitioners in thisfield have not been able to address the matter of preparation ofsynthetic polypeptide analogs of the interferons with any significantdegree of success.

Put another way, the work of Gray, et al., supra, in the isolation of agene coding for IPN-γ and the extensive labors of Edge, et al., supra,in providing a wholly manufactured IFN-α₁ gene provide only geneticmaterials for expression of single, very precisely defined, polypeptidesequences. There exist no procedures (except, possibly, for sitespecific mutagenesis) which would permit microbial expression of largequantities of human IFN-α analogs which differed from the “authentic”polypeptide in terms of the identity or location of even a single aminoacid. In a like manner, preparation of an IFN-α analog which differed byone amino acid from the polypeptide prepared by Edge, et al., supra,would appear to require an additional year of labor in constructing awhole new gene which varied in terms of a single triplet codon. No meansis readily available for the excision of a fragment of the subject geneand replacement with a fragment including the coding information for avariant polypeptide sequence. Further, modification of the reportedcDNA-derived and manufactured DNA sequences to vary codon usage is notan available “option”.

Indeed, the only report of the preparation of variant interferonpolypeptide species by recombinant DNA techniques has been in thecontext of preparation and expression of “hybrids” of human genes forIFN-α and IFN-α₂ [weck, et al., Nucleic Acids Research, 9,pp. 6153-6168(1981) and Streuli, et al., Proc. Nat. Acad. Sci. U.S.A., 78, pp.2848-2852 (1981)]. The hydrids obtained consisted of the four possiblecombinations of gene fragments developed upon finding that two of theeight human (cDNA-derived) genes fortuitously included only once withinthe sequence, base sequences corresponding to the restrictionendonuclease cleavage sites for the bacterial endonucleases, PvuII andBgIII.

There exists, therefore, a substantial need in the art for moreefficient procedures for the total synthesis from nucleotide bases ofmanufactured DNA sequences coding for large polypeptides such as theinterferons. There additionally exists a need for synthetic methodswhich will allow for the rapid construction of variant forms ofsynthetic sequences such as will permit the microbial expression ofsynthetic polypeptides which vary from naturally occurring forms interms of the identity and/or position of one or more selected aminoacids.

BRIEF SUMMARY

The present invention provides novel, rapid and highly efficientprocedures for the total synthesis of linear, double stranded DNAsequences in excess of about 200 nucleotide base pairs in length, whichsequences may comprise entire structural genes capable of directing thesynthesis of a wide variety of polypeptides of interest.

According to the invention, linear, double stranded DNA sequences of alength in excess of about 200 base pairs and coding for expression of apredetermined continuous sequence of amino acids within a selected hostmicroorganism transformed by a selected DNA vector including thesequence, are synthesized by a method comprising:

-   -   (a) preparing two or more different, subunit, linear, double        stranded DNA sequences of about 100 or more base pairs in length        for assembly in a selected assembly vector,

each different subunit DNA sequence prepared comprising a series ofnucleotide base codons coding for a different continuous portion of saidpredetermined sequence of amino acids to be expressed,

-   -   one terminal region of a first of said subunits comprising a        portion of a base sequence which provides a recognition site for        cleavage by a first restriction endonuclease, which recognition        site is entirely present either once or not at all in said        selected assembly vector upon insertion of the subunit therein,    -   one terminal region of a second of said subunits comprising a        portion of a base sequence which provides a recognition site for        cleavage by a second restriction endonuclease other than said        first endonuclease, which recognition site is entirely present        once or not at all in said selected assembly vector upon        insertion of the subunit therein,    -   at least one-half of all remaining terminal regions of subunits        comprising a portion of a recognition site (preferably a        palindromic six base recognition site) for cleavage by a        restriction endonuclease other than said first and second        endonucleases, which recognition site is entirely present once        and only once in said selected assembly vector after insertion        of all subunits thereinto; and    -   (b) serially inserting each of said subunit DNA sequences        prepared in step (a) into the selected assembly vector and        effecting the biological amplification of the assembly vector        subsequent to each insertion, thereby to form a DNA vector        including the desired DNA sequence coding for the predetermined        continuous amino acid sequence and wherein the desired DNA        sequence assembled includes at least one unique, preferably        palindromic six base, recognition site for restriction        endonuclease cleavage at an intermediate position therein.

The above general method preferably further includes the step ofisolating the desired DNA sequence from the assembly vector preferablyto provide one or the class of novel manufactured DNA sequences havingat least one unique palindromic six base recognition site forrestriction endonuclease cleavage at an intermediate position therein. Asequence so isolated may then be inserted in a different, “expression”vector and direct expression of the desired polypeptide by amicroorganism which is the same as or different from that in which theassembly vector is amplified. In other preferred embodiments of themethod: at least three different subunit DNA sequences are prepare instep (a) and serially inserted into said selected assembly vector instep (b) and the desired manufactured DNA sequence obtained includes atleast two unique palindromic six base recognition sites for restrictionendonuclease cleavage at intermediate positions therein; the DNAsequence synthesized comprises an entire structural gene coding for abiologically active polypeptide; and, in the DNA sequence manufactured,the sequence of nucleotide bases includes one or more codons selected,from among alternative codons specifying the same amino acid, on thebasis of preferential expression characteristics of the codon in saidselected host microorganism.

Novel products of the invention include manufactured, linear, doublestranded DNA sequences of a length in excess of about 200 base pairs andcoding for the expression of a predetermined continuous sequence ofamino acids by a selected host microorganism transformed with a selectedDNA vector including the sequence, characterized by having at least oneunique palindromic six base recognition site for restrictionendonuclease cleavage at an intermediate position therein. Also includedare polypeptide products of the expression by an organism of suchmanufactured sequences.

Illustratively provided by the present invention are novel manufacturedgenes coding for the synthesis of human immune interferon (IFN-γ) andnovel biologically functional analog polypeptides which differ fromhuman immune interferon in terms of the identity and/or location of oneor more amino acids. Also provided are manufactured genes coding forsynthesis of human leukocyte interferon of the F subtype (“LeIFN—F” or“IFN-αF”) and analogs thereof, along with consensus human leukocyteinterferons.

DNA subunit sequences for use in practice of the methods of theinvention are preferably synthesized from nucleotide bases according tothe methods disclosed in co-owned, concurrently-filed U.S. Pat. No. No.4,652,639, by Yitzhak Stabinsky, entitled “Manufacture and Expression ofStructural Genes”. Briefly summarized the general method comprises thesteps of:

-   -   (1) preparing two or more different, linear, duplex DNA strands,        each duplex strand including a double stranded region of 12 or        more selected complementary base pairs and further including a        top single stranded terminal sequence of from 3 to 7 selected        bases at one end of the strand and/or a bottom single stranded        terminal sequence of from 3 to 7 selected bases at the other end        of the strand, each single stranded terminal sequence of each        duplex DNA strand comprising the entire base complement of at        most one single stranded terminal sequence of any other duplex        DNA strand prepared; and    -   (2) annealing each duplex DNA strand prepared in step (1) to one        or two different duplex strands prepared in step (1) having a        complementary single stranded terminal sequence, thereby to form        a single continuous double stranded DNA sequence which has a        duplex region of at least 27 selected base pairs including at        least 3 base pairs formed by complementary association of single        stranded terminal sequences of duplex DNA strands prepared in        step (1) and which has from 0 to 2 single stranded top or bottom        terminal regions of from 3 to 7 bases.

In the preferred general process for subunit manufacture, at least threedifferent duplex DNA strands are prepared in step (1) and all strands soprepared are annealed concurrently in a single annealing reactionmixture to form a single continuous double stranded DNA sequence whichhas a duplex region of at least 42 selected base pairs including atleast two nonadjacent sets of 3 or more base pairs formed bycomplementary association of single stranded terminal sequences ofduplex strands prepared in step (1).

The duplex DNA strand preparation step (1) of the preferred subunitmanufacturing process preferably comprises the steps of:

-   -   (a) constructing first and second linear deoxyoligonucleotide        segments having 15 or more bases in a selected linear sequence,        the linear sequence of bases of the second segment comprising        the total complement of the sequence of bases of the first        segment except that at least one end of the second segment shall        either include an additional linear sequence of from 3 to 7        selected bases beyond those fully complementing the first        segment, or shall lack a linear sequence of from 3 to 7 bases        complementary to a terminal sequence of the first segment,        provided, however, that the second segment shall not have an        additional sequence of bases or be lacking a sequence of bases        at both of its ends; and,    -   (b) combining the first and second segments under conditions        conducive to complementary association between segments to form        a linear, duplex DNA strand.

The sequence of bases in the double stranded DNA subunit sequencesformed preferably includes one or more triplet codons selected fromamong alternative condons specifying the same amino acid on the basis ofpreferential expression characteristics of the codon in a projected hostmicroorganism, such as yeast cells or bacteria, especially E. colibacteria.

Also provided by the present invention are improvements in methods andmaterials for enhancing levels of expression of selected exogenous genesin E. coli host cells. Briefly stated, expression vectors areconstructed to include selected DNA sequences upstream ofpolypeptide-coding regions which selected sequences are duplicative ofribosome binding site sequences extant in genomic E. coli DNA associatedwith highly expressed endogenous polypeptides. A presently preferredselected sequence associated with E. coli expression of outer membraneprotein F (”OMP-F”).

Other aspects and advantages of the present invention will be apparentupon consideration of the following detailed description thereof.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1: Depicts the major steps in the general procedure for assembly ofhuman IFN-γ specifying genes from subunits IF- 1, IF- 2, and IF- 3.

FIGS. 2A-2C: Depicts the deduced sequences of thirteen IFN-α subtypes.

DETAILED DESCRIPTION

As employed herein, the term “manufactured” as applied to a DNA sequenceor gene shall designate a product either totally chemically synthesizedby assembly of nucleotide bases or derived from the biologicalreplication of a product thus chemically synthesized. As such, the termis exclusive of products “synthesized” by cDNA methods or genomiccloning methodologies which involve starting materials which are ofbiological origin. Table 1 below sets out abbreviatione employed hereinto designate amino acids and includes IUPAC-recommended single letterdesignations.

TABLE I Amino Acid Abbreviation IUPAC Symbol Alanine Ala A Cysteine CysC Aspartic acid Asp D Glumatic acid Glu E Phenylalanine Phe F GlycineGly G Histidine His H Isoleucine Ile I Lysine Lys K Leucine Leu LMethionine Met M Asparagine Asn N Proline Pro P Glutamine Gln Q ArginineArg R Serine Ser S Threonine Thr T Valine Val V Tryptophan Trp WTyrosine Tyr Y

The following abbreviations shall be employed for nucleotide bases: Afor adenine; G for guanine;

T for thymine; U for uracil; and C for cytosine.

For ease of understanding of the present invention, Table II and IIbelow provide tabular correlation between the 64 alternative tripletnucleotide base codons of DNA and the 20 amino acids and transcriptiontermination (“stop”) functions specified thereby. In order to determinethe corresponding correlations for RNA, U is substituted for T in thetables.

TABLE II FIRST SECOND POSITION THIRD POSITION T C A G POSITION T Phe SerTyr Cys T Phe Ser Tyr Cys C Leu Ser Stop Stop A Leu Ser Stop Trp G C LeuPro His Arg T Leu Pro His Arg C Leu Pro Gln Arg A Leu Pro Gln Arg G AIle Thr Asn Ser T Ile Thr Asn Ser C Ile Thr Lys Arg A Met Thr Lys Arg GG Val Ala Asp Gly T Val Ala Asp Gly C Val Ala Glu Gly A Val Ala Glu GlyG

TABLE III Amino Acid Specifying Codon(s) (A) Alanine GCT,GCC,GCA,GCG (C)Cysteine TGT,TGC (D) Aspartic acid GAT,GAC (E) Glutanic acid GAA,GAG (F)Phenylalanine TTT,TTC (G) Glycine GGT,GGC,GGA,GGG (H) Histidine CAT,CAC(I) Isoleucine ATT,ATC,ATA (K) Lysine AAA,AAG (L) LeucineTTA,TTG,CTT,CTC,CTA,CTG (M) Methionine ATG (N) Asparagine AAT,AAC (P)Proline CCT,CCC,CCA,CCG (Q) Glutamine CAA,CAG (R) ArginineCGT,CGC,CGA,CGG,AGA,AGG (S) Serine TCT,TCC,TCA,TCG,AGT,AGC (T) ThreonineACT,ACC,ACA,ACG (V) Valine GTT,GTC,GTA,GTG (W) Tryptophan TGG (Y)Tyrosine TAC,TAT STOP TAA,TAG,TGA

A “palindromic” recognition site for restriction endonuclease cleavageof double stranded DNA is one which displays “left-to-right andright-to-left” symmetry between top and bottom base complements, i.e.,where “readings” of complementary base sequences of the recognition sitefrom 5′ to 3′ ends are identical. Examples of palindromic six baserecognition sites for restriction endonuclease cleavage include thesites for cleavage by HindIII wherein top and bottom strands read from5′ to 3′ as AAGCTT. A non-palindromic six base restriction site isexemplified by the site for cleavage by EcoP15, the top strand of whichreportedly reads CAGCAG. The bottom strand base complement, when read 5′to 3′ is CTGCTG. Essentially by definition, restriction sites comprisingodd numbers of bases (e.g., 5, 7) are non-palindromic. Certainendonucleases will cleave at variant forms of a site, which may bepalindromic or not. For example, XhoII will recognize a site which reads(any purine)GATC (any pyrimidine) including the palindromic sequenceAGATCT and the non-palindromic sequence GGATCT. Referring to thepreviously-noted “BRL Restriction Endonuclease Reference Chart,”endonucleases recognizing six base palindromic sites exclusively includeBbrI, ChuI, Hin173, Rin91R, HinbIII, HinbIII, HindIII, HinfII, HsuI,BgIII, StuI, RruI, ClaI, AvaIII, PvuII, SmsI, XmaI, EccI, SacII, SboI,SbrI, ShyI, SstII, TgII, AvrII, PvuI, RshI, RspI, XniI, XorII, XmaIII,BluI, MsiI, ScuI, SexI, SgoI, SlaI, SluI, SpaI, XhoI, XpaI, Bce170,Bsu1247, PstI, SalPI, XmaII, XorI, EcoRI, Rsh630I, SacI, SstI, SphI,BamEI, BamKI, BamNI, BamFI, BstI, KpnI, SaII, XamI, HpaI, XbaI, AtuCI,BcII, CpeI, SstIV, AosI, MstI, BaII, AsuII, and M1aI. Endonucleaseswhich recognize only non-palindromic six base sequences exclusivelyinclude Tth111II, EcoP15, AvaI, and AvrI. Endonucleases recognizing bothpalindromic and non-palindromic six base sequences include HaeI, HgiAI,AcyI, AosII, AsuIII, AccI, ChuII, HincII, HindIII, MnnI, XboII, HaeII,HinHI, NgoI, and EcoRI′.

Upon determination of the structure of a desired polypeptide to beproduced, practice of the present invention involves: preparation of twoor more different specific, continuous double stranded DNA subunitsequences of 100 or more base pairs in length and having terminalportions of the proper configuration; serial insertion of subunits intoa selected assembly vector with intermediate amplification of the hybridvectors in a selected host organism; use of the assembly vector (or analternate, selected “expression” vector including the DNA sequence whichhas been manufactured from the subunits) to transform a suitable,selected host; and, isolating polypeptide sequences expressed in thehost organism. In its most efficient forms, practice of the inventioninvolves using the same vector for assembly of the manufactured sequenceand for large scale expression of the polypeptide. Similarly, the hostmicroorganism employed for expression will ordinarily be the same asemployed for amplifications performed during the subunit assemblyprocess.

The manufactured DNA sequence may be provided with a promoter/regulatorregion for autonomous control of expression or may be incorporated intoa vector in a manner providing for control of expression by apromoter/regulator sequence extant in the vector. Manufactured DNAsequences of the invention may suitably be incorporated into existingplasmid-borne genes (e.g., β-galactosidase) to form fusion genes codingfor fusion polypeptide products including the desired amino acidsequences coded for by the manufactured DNA sequences.

In practice of the invention in its preferred forms, polypeptidesproduced may vary in size from about 65 or 70 amino acids up to about200 or more amino acids. High levels of expression of the desiredpolypeptide by selected transformed host organisms is facilitatedthrough the manufacture of DNA sequences which include one or morealternative codons which are preferentially expressed by the host.

Manufacture of double stranded subunit DNA sequences of 100 to 200 baseparts in length may proceed according to prior art assembly methodspreviously referred to, but is preferably accomplished by means of therapid and efficient procedures disclosed in the aforementioned U.S. Pat.No. 4,652,639 by Stabinsky and used in certain of the following examplesof actual practice of the present invention. Briefly put, theseprocedures involve the assembly from deoxyoligonucleotides of two ormore different, linear, duplex DNA strands each including a relativelylong double stranded region along with a relatively short singlestranded region on one or both opposing ends of the double strand. Thedouble stranded regions are designed to include codons needed to specifyassembly of an initial, or terminal or intermediate portion of the totalamino acid sequence of the desired polypeptide. Where possible,alternative codons preferentially expressed by a projected host (e.g.,E. coli) are employed. Depending on the relative position to be assumedin the finally assembled subunit DNA sequence, the single strandedregion(s) of the duplex strands will include a sequence of bases which,when complemented by bases of other duplex strands, also provide codonsspecifying amino acids within the desired polypeptide sequence.

Duplex strands formed according to this procedure are then enzymaticallyannealed to the one or two different duplex strands having complementaryshort, single stranded regions to form a desired continuous doublestranded subunit DNA sequence which codes for the desired polypeptidefragment.

High efficiencies and rapidity in total sequence assembly are augmentedin such procedures by performing a single annealing reaction involvingthree or more duplex strands, the short, single stranded regions ofwhich constitute the base complement of at most one other singlestranded region of any other duplex strand. Providing all duplex strandsformed with short single stranded regions which uniquely complement onlyone of the single stranded regions of any other duplex is accomplishedby alternative codon selection within the context of genetic coderedundancy, and preferably also in the context of codon preferences ofthe projected host organism.

The following description of the manufacture of a hypothetical long DNAsequence coding for a hypothetical polypeptide will serve to graphicallyillustrate practice of the invention, especially in the context offormation of proper terminal sequences on subunit DNA sequences.

A biologically active polypeptide of interest is isolated and its aminoacids are sequenced to reveal a constitution of 100 amino acid residuesin a given continuous sequence. Formation of a manufactured gene formicrobial expression of the polypeptide will thus require assembly of atleast 300 base pairs for insertion into a selected viral or circularplasmid DNA vector to be used for transformation of a selected hostorganism.

A preliminary consideration in construction of the manufactured gene isthe identity of the projected microbial host, because foreknowledge ofthe host allows for codon selection in the context of codon preferencesof the host species. For purposes of this discussion, the selection ofan E. coli bacterial host is posited.

A second consideration in construction of the manufactured gene is theidentity of the projected DNA vector employed in the assembly process.Selection of a suitable vector is based on existing knowledge of sitesfor cleavage of the vector by restriction endonuclease enzymes. Moreparticularly, the assembly vector is selected on the basis of includingDNA sequences providing endonuclease cleavage sites which will permiteasy insertion of the subunits. In this regard, the assembly vectorselected preferably has at least two restriction sites which occur onlyonce (i.e., are “unique”) in the vector prior to performance of anysubunit insertion processes. For the purposes of this description, theselection of a hypothetical circular DNA plasmid pBR 3000 having asingle EcoRI restriction site, i.e.,

-GAATTC-, -CTTAAG- and a single PvuII restriction site, i.e.,

-CAGCTG-, -GTCGAC- is posited.

The amino acid sequence of the desired polypeptide is then analyzed inthe context of determining availability of alternate codons for givenamino acids (preferably in the context of codon preferences of theprojected E. coli host). With this information in hand, two subunit DNAsequences are designed, preferably having a length on the order of about150 base pairs—each coding for approximately one-half of the total aminoacid sequences of the desired polypeptide. For purposes of thisdescription, the two subunits manufactured will be referred to as “A”and “B”.

The methods of the present invention as applied to two such subunits,generally call for: insertion of one of the subunits into the assemblyvector; amplification of the hybrid vector formed; and insertion of thesecond subunit to form a second hybrid including the assembled subunitsin the proper sequence. Because the method involves joining the twosubunits together in a manner permitting the joined ends to provide acontinuous preselected sequence of bases coding for a continuouspreselected sequence of amino acids, there exists certain requirementsconcerning the identity and sequence of the bases which make up theterminal regions of the manufactured subunits which will be joined toanother subunit. Because the method calls for joining subunits to theassembly vector, there exist other requirements concerning the identityand sequence of the bases which make up those terminal regions of themanufactured subunits which will be joined to the assembly vector.Because the subunits are serially, rather than concurrently, insertedinto the assembly vector (and because the methods are most beneficiallypracticed when the subunits can be selectively excised from assembledform to allow for alterations in selected base sequences therein), stillfurther requirements exist concerning the identity of the bases interminal regions of subunits manufactured. For ease of understanding inthe following discussion of terminal region characteristics, theopposing terminal regions of subunits A and B are respectively referredto as A-1 and A-2, and B-1 and B-2, viz:

Assume that an assembly strategy is developed wherein subunit A is to beinserted into pBR3000 first, with terminal region A-1 to be ligated tothe vector at the EcoRI restriction site. In the simplest case, theterminal region is simply provided with an EcoRI “sticky end”, i.e., asingle strand of four bases (-AATT- or -TTAA-) which will complement asingle stranded sequence formed upon EcoRI digestion of pBR3000. Thiswill allow ligation of terminal region A-1 to the vector upon treatmentwith ligase enzyme. Unless the single strand at the end of terminalregion A-1 is preceded by an appropriate base pair (e.g.,

5′-G- 3′-CTTAA-the entire recognition site will not be reconstituted upon ligation tothe vector. Whether or not the EcoRI recognition site is reconstitutedupon ligation (i.e., whether or not there will be 0 or 1 EcoRI sitesremaining after insertion of subunit A into the vector) is at the optionof the designer of the strategy. Alternatively, one may construct theterminal region A-1 of subunit A to include a complete set of base pairsproviding a recognition site for some other endonuclease, hypotheticallydesignated “XXX”, and then add on portions of the EcoRI recognition siteas above to provide an EcoRI “linker”. To be of practical use inexcising subunit A from an assembled sequence, the “XXX” site should notappear elsewhere in the hybrid plasmid formed upon insertion. Therequirement for construction of terminal region A-1 is, therefore, thatis comprise a portion (i.e., all or part) of a base sequence whichprovides a recognition site for cleavage by a restriction endonuclease,which recognition site is entirely present either once or not at all inthe assembly vector upon insertion of the subunit.

Assume that terminal region B-2 of subunit B is also to be joined to theassembly vector (e.g., at the single recognition site for PvuII cleavagepresent on pBR3000). The requirements for construction of terminalregion B-2 are the same as for construction of A-1, except that thesecond endonuclease enzyme in reference to which the construction of B-2is made must be different from that with respect to which theconstruction of A-1 is made. If recognition sites are the same, one willnot be able to separately excise segments A and B from the fullyassembled sequence.

The above assumptions require, then, that terminal region A-2 is to beligated to terminal region B-1 in the final pBR3000 hybrid. Either theterminal region A-2 or the terminal region B-1 is constructed tocomprise a portion of a (preferably palindromic six base) recognitionsite for restriction endonuclease cleavage by hypothetical thirdendonuclease “YYY” which recognition site will be entirely present onceand only once in the expression vector upon insertion of all subunitsthereinto, i.e., at an intermediate position in the assemblage ofsubunits. There exist a number of strategies for obtaining this result.In one alternative strategy, the entire recognition site of “YYY” iscontained in terminal region A-2 and the region additionally includesthe one or more portions of other recognition sites for endonucleasecleavage needed to (1) complete the insertion of subunit A into theassembly vector for amplification purposes, and (2) allow for subsequentjoining of subunit A to subunit B. In this case, terminal region B-1would have at its end only the bases necessary to link it to terminalregion A-2. In another alternative, the entire “YYY” recognition site isincluded in terminal region B-1 and B-1 further includes at its end aportion of a recognition site for endonuclease cleavage which is usefulfor joining subunit A to subunit B.

As another alternative, terminal region B-1 may contain at its end aportion of the “YYY” recognition site. Terminal region A-2 would thencontain the entire “YYY” recognition site plus, at its end, a suitable“linker” for joining A-2 to the assembly vector prior to amplificationof subunit A (e.g., a PvuII “sticky end”). After amplification of thehybrid containing subunit A, the hybrid would be cleaved with “YYY”(leaving a sticky-ended portion of the “YYY” recognition site exposed onthe end of A-2) and subunit B could be inserted with its B-1 terminalregion joined with the end of terminal region A-2 to reconstitute theentire “YYY” recognition site. The requirement for construction of theterminal regions of all segments (other than A-1 and B-2) is that one orthe other or both (i.e., “at least half”) comprise a portion (i.e.,include all or part) of a recognition site for third restrictionendonuclease cleavage, which recognition site is entirely present onceand only once (i.e., is “unique”) in said assembly vector afterinsertion of all subunits thereinto. To generate a member of the classof novel DNA sequences of the invention, the recognition site of thethird endonuclease should be a six base palindromic recognition site.

While a subunit “terminal region” as referred to above could beconsidered to extend from the subunit end fully halfway along thesubunit to its center, as a practical matter the construction notedwould ordinarily be performed in the final 10 or 20 bases. Similarly,while the unique “intermediate” recognition site in the two subunitassemblage may be up to three times closer to one end of themanufactured sequence than it is to the other, it will ordinarily belocated near the center of the sequence. If, in the above description, asynthetic plan was generated calling for preparation of three subunitsto be joined, the manufactured gene would include two unique restrictionenzyme cleavage sites in intermediate positions at least one of whichwill have a palindromic six base recognition site in the class of newDNA sequences of the invention.

The significant advantages of the above-described process are manifest.Because the manufactured gene now includes one or more uniquerestriction endonuclease cleavage sites at intermediate positions alongits length, modifications in the codon sequence of the two subunitsjoined at the cleavage site may be effected with great facility andwithout the need to re-synthesize the entire manufactured gene.

Following are illustrative examples of the actual practice of theinvention in formation of manufactured genes capable of directing thesynthesis of: human immune interferon (IFNγ) and analogs thereof; humanleukocyte interferon of the F subtype (INF-αF) and analogs thereof; and,multiple consensus leukocyte interferons which, due to homology toIFN-αF can be named as IFN-αF analogs. It will be apparent from theseexamples that the gene manufacturing methodology of the presentinvention provides an overall synthetic strategy for the truly rapid,efficient synthesis and expression of genes of a length in excess of 200base pairs within a highly flexible framework allowing for variations inthe structures of products to be expressed which has not heretofore beenavailable to investigators practicing recombinant DNA techniques.

EXAMPLE 1

In the procedure for construction of synthetic genes for expression ofhuman IFNγ a first selection made was the choice of E. coli as amicrobial host for eventual expression of the desired polypeptides.Thereafter, codon selection procedures were carried out in the contextof E. coli codon preferences enumerated in the Grantham publications,supra. A second selection made was the choice of pBR322 as an expressionvector and, significantly, as the assembly vector to be employed inamplification of subunit sequences. In regard to the latter factor, theplasmid was selected with the knowledge that it included single BamHI,HindIII, and SaII restriction sites. With these restriction sites andthe known sequence of amino acids in human immune interferon in mind, ageneral plan for formation of three “major” subunit DNA sequences (IF-3,IF-2 and IF-1) and one “minor” subunit DNA sequence (IF-4) was evolved.This plan is illustrated by Table IV below.

TABLE IV IF-4

IF-3

IF-2 EcoRI

IF-1 EcoRI

The “minor” sequence (IF-4) is seen to include codons for the 4ththrough 1st (5′-TGT TAC TGC CAG) amino acids and an ATG codon for aninitiating methionine [Met⁻¹]. In this construction, it also includesadditional bases to provide a portion of a control involved in anexpression vector assembly from pBR 322 as described infra.

Alternative form of subunit IFN-1 for use in synthesis of a manufacturedgene for [Arg ¹⁴⁰]IFNγ included the codon 5′-CGT in place of 5′-CAG (for[Gln]) at the codon site specifying the 140th amino acid.

The codon sequence plan for the top strand of the polypeptide-specifyingportion total DNA sequence synthesized was as follows:

5′-TGT-TAC-TGC-CAG-GAT-CCG-TAC-GTT-AAG-GAA-GCA-   GAAAAC-CTG-AAA-AAA-TAC-TTC-AAC-GCA-GGC-CAC-TCC-   GAC-GTA-GCT-GAT-AAC-GGC-ACC-CTG-TTC-CTG-GGT-   ATC-CTA-AAA-AACTGG-AAA-GAG-GAA-TCC-GAC-CTG-AAG-   ATC-ATG-CAG-TCT-CAA-ATT-GTA-AGC-TTC-TAC-TTC-   AAA-CTG-TTC-AAG-AAC-TTC-AAAGAC-GAT-CAA-TCC-ATC-   CAG-AAG-AGC-GTA-GAA-ACT-ATT-AAG-GAG-GAC-ATG-   AAC-GTA-AAA-TCC-TTT-AAC-AGC-AAC-AAG-AAGAAA-CGC-   GAT-GAC-TTC-GAG-AAA-CTG-ACT-AAC-TAC-TCT-GTT-   ACA-GAT-CTG-AAC-GTG-CAG-CGT-AAA-GCT-ATT-CAC-   GAA-CTGATC-CAA-GTT-ATG-GCT-GAA-CTG-TCT-CCT-GCG-   GCA-AAG-ACTGGC-AAA-CGC-AAG-CGT-AGC-CAG-ATG-CTG-    TTT-CAG-[orCGT]-CGT-CGC-CGT-GCT-TCT-CAG.

In the above sequence, the control sequence bases and the initialmethionine-specifying codon is not illustrated, nor are terminationsequences or sequences providing a terminal SaII restriction site.Vertical lines separate top strand portions attributable to each of thesubunit sequences.

The following example illustrates a preferred general procedure forpreparation of deoxyoligonucleotides for use in the manufacture of DNAsequences of the invention.

EXAMPLE 2

Oligonucleotide fragments were synthesized using a four-step procedureand several intermediate washes. Polymer bound dimethoxytrityl protectednucleoside in a sintered glass funnel was first stripped of its5′-protecting group (dimethoxytrityl) using 3% trichloroacetic acid indichloromethane for 1½minutes. The polymer was then washed withmethanol, tetrahydrofuran and acetonitrile. The washed polymer was thenrinsed with dry acetonitrile, placed under argon and then treated in thecondensation step as follows. 0.5 ml of a solution of 10 mg tetrazole inacetonitile was added to the reaction vessel containing polymer. Then0.5 ml of 30 mg protected nucleoside phosphoramidite in acetronitrilewas added. This reaction was agitated and allowed to react for 2minutes. The reactants were then removed by suction and thepolymer-rinsed with acetonitrile. This was followed by the oxidationstep wherein 1 ml of a solution containing 0.1 molar 12 in2-6-lutidine/H₂O/TEF, 1:2:2, was reacted with the polymer boundoligonucleotide chain for 2 minutes. Following a THF rinse capping wasdone using a solution of dimethylaminopyridine (6.5 g in 100 ml THF) andacetic anhydride in the proportion 4:1 for 2 minutes. This was followedby a methanol rinse and a THF rinse. Then the cycle began again with atrichloroacetic acid in CH₂Cl₂ treatment. The cycle was repeated untilthe desired oligonucleotide sequence was obtained.

The final oligonucleotide chain was treated with thiophenol dioxane,triethylanine 1:2:2, for 45 minutes at room temperature. Then, afterrinsing with dioxane, methanol and diethylether, the oligonucleotide wascleaved from the polymer with concentrated ammonium hydroxide at roomtemperature. After decanting the solution from the polymer, theconcentrated ammonium hydroxide solution was heated at 60° C. for 16hours in a sealed tube.

Each oligonucleotide solution was then extracted four times with1-butanol. The solution was loaded into a 20% polyacrylamide 7 molarurea electrophoresis gel and, after running, the approximate product DNAband was isolated.

Subunits were then assembled from deoxyoligonucleotides according to thegeneral procedure for assembly of subunit IF-1.

Following the isolation of the desired 14 DNA segments, subunit IF-1 wasconstructed in the following manner:

1. One nanomole of each of the DNA fragments excluding segment 13 andsegment 2 which contain 5′ cohesive ends, were subjected to5′-phosphorylation;

2. The complementary strands of DNA, segments 13 and 14, 11 and 12, 9and 10, 7 and 8, 5 and 6, 3 and 4 and 1 and 2 were combined together,warmed to 90° and slowly cooled to 25°;

3. The resulting annealed pairs of DNA were combined sequentially andwarmed to 370 and slowly cooled to 25°;

4. The concentration of ATP and DTT in the final tube containingsegments 1 thru 14 was adjusted to 150 μM and 18 mM respectively. Twentyunits of T-4 DNA ligase was added to this solution and the reaction wasincubated at 4° for 18 hrs;

5. The resulting crude product was heated to 90° for 2 min. andsubjected to gel filtration on Sephadex G50/40 using 10 mM triethylammonium bicarbonate as the eluent;

6. The desired product was purified, following 5′ phosphorylation, usingan 8% polyacrylamide-TBE gel.

Subunits IF-2, IF-3 and IF-4 were constructed in a similar manner.

The following example relates to: assembly of the complete human immuneinterferon gene from subunits IF-1, IF-2, IF-3, and IF-4; procedures forthe growing, under appropriate nutrient conditions, of transformed E.coli cells, the isolation of human immune interferon from the cells, andthe testing of biological activity of interferon so isolated.

EXAMPLE 3

The major steps in the general procedure for assembly of the completehuman IFNγ specifying genes from subunits IF-1, IF-2, and IF-3 areillustrated in FIG. 1.

The 136 base pair subunit IF-1 was electro-eluted from the gel, ethanolprecipitated and resuspended in water at a concentration of 0.05pmol/μl. Plasmid pB322 (2.0 pmol) was digested with EcoRI and SaII,treated with phosphatase, phenol extracted, ethanol precipitated, andresuspended in water at a concentration of 0.1 pmol/μl. Ligation wascarried out with 0.1 pmol of the plasmid and 0.2 pmol of subunit IF-1,using T-4 DNA ligase to form hybrid plasmid pINT1. E. coli weretransformed and multiple copies of pINT1 were isolated therefrom.

The above procedure was repeated for purposes of inserting the 153 basepair subunit IF-2 to form pINF2 except that the plasmid was digestedwith EcoRI and BgIII. The 153 base pair IF-3 subunit was similarlyinserted into pINT2 during manufacture of pINT3 except that EcoRI andHind III were used to digest the plasmid.

As IF-4 subunit was employed in the construction of the final expressionvector as follows: Plasmid PVvI was purchased from Stanford University,Palo Alto, Calif., and digested with PvuII. Using standard procedures,an EcoRI recognition site was inserted in the plasmid at a PvuII site.Copies of this hybrid were then digested with EcoRI and HpaI to providea 245 base pair sequence including a portion of the trppromoter/operator region. By standard procedures, IF-4 was added to theHpaI site in order to incorporate the remaining 37 base pairs of thecomplete trp translational initiation signal and bases providing codonsfor the initial four amino acids of immune interferon (Cys-Tyr-Cys-Gln).The resulting assembly was then inserted into pINT3 which had beendigested with EcoRI and BamHI to yield a plasmid designated pINTγ-trpI7.

E. coli cells containing pINTγ-trpI7 were growing on K media in theabsence of tryptophan to an O.D. ₆₀₀ of 1. Indoleacrylic acid was addedat a concentration of 20 μg per ml and the cells were cultured for anadditional 2 hours at 37° C. Cells were harvested by centrifugation andthe cell pellet was resuspended in fetal calf serum buffered with HEPES(pH 8.0). Cells were lysed by one passage through a French press at10,000 psi. The cell lysate was cleared of debris by centrifugation andthe supernatant was assayed for antiviral activity by the CPE assay[“The Interferon System” Steward, ed., Springer-Verlag, N.Y., N.Y.(1981)]. The isolated product of expression was designated γ-1.

This example relates to a modification in the DNA sequence of plasmidpINTγ-trpI7 which facilitated the use of the vector in the trppromoter-controlled expression of structural genes coding for, e.g.,analogs of IFN-γ and IFN-αF.

EXAMPLE 4

Segment IF-4, as previously noted, had been constructed to include basescoding for an initial methionine and the first four amino acids of IFN-γas well as 37 base pairs (commencing at its 5′ end with a HpaI bluntend) which completed at the 3′ end of a trp promoter/operator sequence,including a Shine Delgarno ribosome binding sequence. It was clear thatmanipulations involving sequences coding IFN-γ analogs and forpolypeptides other than IFN-γ would be facilitated if a restriction site3′ to the entire trp promoter/operator region could be established. Byway of illustration, sequences corresponding to IF-4 for other genescould then be constructed without having to reconstruct the entire 37base pairs needed to reconstitute the trp promoter/operator and wouldonly require bases at the 5′ end such as would facilitate insertion inthe proper reading frame with the complete promoter/operator.

Consistent with this goal, sequence IF-4 was reconstructed toincorporate an XbaI restriction site 3′ to the base pairs completing thetrp promoter/operator. The construction is shown in Table V below.

TABLE V HpaI

This variant form of segment IF-4 was inserted in pINTγ-trpI7 (digestedwith HpaI and BamHI) to generate plasmid, pINTγ-TXb₄ from which theIFN-γ-specifying gene could be deleted by digestion with XbaI and SaIIand the entire trp promoter/operator would remain on the large fragment.

The following example relates to construction of structural analogs ofIFN—Y whose polypeptide structure differs from that of IFN—Y in terms ofthe the identity of location of one or more amino acids.

EXAMPLE 5

A first class of analogs of IFN-γ was formed which included a lysineresidue at position 81 in place of asparagine. The single base sequencechange needed to generate this analog was in subunit IF-2 of Table IV insegments 35 and 36. The asparagine-specifying codon, AAC, was replacedby the lysine-specifying colon, AAG. The isolated product of expressionof such a modified DNA sequence [Lys⁸¹]IFN-γ, was designated γ-10.

Another class of IFNγ analogs consists of polypeptides wherein one ormore potential glycosilation sites present in the amino acid sequenceare deleted. More particularly, these consist of [Arg¹⁴⁰]IFNγ or[Gln¹⁴⁰]IFNγ wherein the polypeptide sequence fails to include one ormore naturally occurring sequences, [(Asn or Gln)-(ANY)-(Ser or Thr)],which are known to provide sites for glycosilation of the polypeptide.One such sequence in IFNγ spans positions 28 through 30, (Asn-Gly-Thr),another spans positions 101 through 103 (Asn-Tyr-Ser). Preparation of ananalog according to the invention with a modification at positions 28-30involved cleavage of plasmid containing all four IFN-γ subunits withBamHI and HindIII to delete subunit IF-3, followed by insertion of avariant of subunit IF-3 wherein the AAC codon for asparagine therein isreplaced by the codon for glutamine, CAG. (Such replacement is effectedby modification of deoxyoligonucleotide segment 37 to include CAG ratherthan AAC and of segment 38 to include GTC rather than TTG. See TableIV.) The isolated product of expression of such a modified DNA sequence,[Gln²⁸]IFN-γ, was designated y-12. Polypeptide analogs of this typewould likely not be glycosilated if expressed in yeast cells.Polypeptide analogs as so produced are not expected to differappreciably from naturally-occurring IFNγ in terms of reactivity withantibodies to the natural form, or in duration of antiproliferative orimmunomodulatory pharmacological effects, but may display enhancedpotency of pharmacological activity in one or more manner.

Other classes of IFNγ analogs consists of polypeptides wherein the [Trp39) residue is replaced by [Phe³⁹], and/or wherein one or more of themethionine residues at amino acid positions 48, 80, 120 and 137 arereplaced by, e.g., leucine, and/or wherein cysteines at amino acidpositions 1 and 3 are replaced by, e.g., serine or are completelyeliminated. These last-mentioned analogs may be more easily isolatedupon microbial expression because they lack the capacity for formationof intermolecular disulfide bridge formation.

Replacement of tryptophane with phenylalanine at position 39 requiredsubstitution for a TGG codon in subunit IF-3 with TTC (although TTTcould also have been used), effected by modification of thedeoxyoligonucleotide segment 33 (TGG to TTC) and overlapping segment 36(TGA to TAC) used to manufacture IF-3. [Phe³⁹ Lys⁸]IFN-γ, the isolatedproduct of expression of such a modified DNA sequence (which alsoincluded the above-noted replacement of asparagine by lysine at position81) was designated y-5.

In a like manner, replacement of one or more methionines at positions48, 80, 120, and 137, respectively, involves alteration of subunit IF-3(with reconstruction of deoxyoligonucleotides 31, 32 and 34), subunitIF-2 (with reconstruction of deoxyoligonucleotide segments 21 and 22);and subunit IF-1 (with reconstruction of deoxyoligonucleotide segments 7and 10 and/or 3 and 4). An analog of IFN-γ wherein threonine replacedmethionine at position 48 was obtained by modification of segment 31 insubunit IF-3 to delete the methionine-specifying codon ATG and replaceit with an ACT codon. Alterations in segments 34 (TAC to TGA) were alsoneeded to effect this change. [Thr⁴⁸, Lys⁸¹] IFN-γ, the isolated productof expression of such a modified DNA sequence (also including alysine-specifying codon at position 81) was designated y-6.

Replacement or deletions of cysteines at positions 1 and 3 involves onlyalteration of subunit IF-4. As a first example, modifications inconstruction of subunit IF-4 to replace both of the cysteine-specifyingcodons at positions 1 and 3 (TGT and TGC, respectively) with theserine-specifying codon, TCT, required reconstruction of only 2 segments(see e and f of Table IV). [Ser¹, Ser³, Lys⁸¹]IFN-γ, the isolatedproduct of expression of the thus modified [Lys⁸¹]IFN-γ DNA sequence,was designated y-2. As another example, [Lys¹, Lys², Gln³, Lys⁸¹]IFN-γ,designated γ-3, was obtained as an expression product of a modifiedconstruction of subunit IF-4 wherein codons AAA, AAA, and CAArespectively replaced TTG, TAC and TGC. Finally, [des-Cys¹, des-Tyr²,des-Cys³, Lys⁸¹]IFN-γ, designated γ-4, was obtained by means ofmodification of subunit IF-4 sections to

5′-ATC CAG-3′ 3′-TAC GTC-5′in the amino acid specifying region. It should be noted that the abovemodifications in the initial amino acid coding regions of the gene weregreatly facilitated by the construction of pINTγ-TXb4 in Example 4 whichmeant that only short sequences with XbaI and BamHI sticky ends neededto be constructed to complete the amino terminal protein coding sequenceand link the gene to the complete trp promoter.

Among other classes of IFN-γ analog polypeptide provided by the presentinvention are those including polypeptides which differ from IFN-γ interms of amino acids traditionally held to be involved in secondary andtertiary configuration of polypeptides. As an example, provision of acysteine residue at an intermediate position in the IFN-γ polypeptidemay generate a species of polypeptide structurally facilitative offormation of intramolecular disulfide bridges between amino terminal andintermediate cysteine residues such as found in IFN-α. Further,insertion or deletion of prolines in polypeptides according to theinvention may alter linear and bending configurations with correspondingeffects on biological activity. [Lys 81, Cys 95)IFN-γ, desigated y-9,was isolated upon expression of a DNA sequence fashioned with

5′-TCG-3′ 3′-AGC-5′replacing

5′-TTC-3′ 3′-AAG-5′in sections 17 and 18 of subunit IF-2. A DNA sequence specifying[Cys⁹⁵]IFN-γ (to be designated γ-11) is being constructed by the samegeneral procedure. Likewise, a gene coding for [Cys Pro]IFN—Y is underconstruction with the threonine-specifying codon ACA (section 15 ofIF-2) being replaced by the proline-specifying codon CCA.

[Glu⁵]IFN-γ, to be designated y-13, will result from modification ofsection 43 in subunit IF-3 to include the glutamate codon, GAA, ratherthan the aspartic acid specifying codon, GAT. Because such a changewould no longer permit the presence of a BamBI recognition site at thatlocus, subunit IF-3 will likely need to be constructed as a compositesubunit with the amino acid specifying portions of subunit IF-4, leavingno restriction site between XbaI and HindIII in the assembled gene. Thisanalog of IFN-γ is expected to be less acid labile than thenaturally-occurring form.

The above analogs having the above-noted tryptophane and/or methionineand/or cysteine replacements are not expected to differ fromnaturally-occurring IFNγ in terms of reactivity with antibodies to thenatural form or in potency or antiproliferative or immunomodulatoryeffect but are expected to have enhanced duration of pharmacologicaleffects.

Still another class of analogs consists of polypeptides of a “hybrids”or “fused” type which include one or more additional amino acids at theend of the prescribed sequence. These would be expressed by DNAsequences formed by the addition, to the entire sequence coding forIFNγ, of another manufactured DNA sequence, e.g., one of the subunitscoding for a sequence of polypeptides peculiar to LeIFN-Con, describedinfra. The polypeptide expressed is expected to retain at least some ofthe antibody reactivity of naturally-occurring IFNγ and to display somedegree of the antibody reactivity of LeIFN. Its pharmacologicalactivities are expected to be superior to naturally-occurring IFN-γ bothin terms of potency and duration of action.

Table VI, below, sets forth the results of studies of antiviral activityof IFN-γ prepared according to the invention along with that of certainof the analogs tested. Relative antiviral activity was assayed in humanBeLa cells infected with encephalomyocarditis virus (EMCV) per unitbinding to a monoclonal antibody to IFN-γ as determined in animmunoabsorbant assay.

TABLE VI Relative Antiviral Interferon Activity γ-1 1.00 γ-4 0.60 γ-50.10 γ-6 0.06 γ-10 0.51

The following example relates to modifications in the polypeptide codingregion of the DNA sequences of the previous examples which serve toenhance the expression of desired products.

EXAMPLE 6

Preliminary analyses performed on the polypeptide products of microbialexpression of manufactured DNA sequences coding for IFN-γ and analogs ofIFN—Y revealed that two major proteins were produced in approximatelyequal quantities—a 17K form corresponding to the complete 146 amino acidsequence and a 12K form corresponding to an interferon fragment missingabout 50 amino acids of the amino terminal. Review of codon usage in themanufactured gene revealed the likelihood that the abbreviated specieswas formed as a result of microbial translation initiation at the Met⁴⁸residue brought about by the similarity of base sequences 3′ thereto toa Shine-Delgarno ribosome binding sequence. It thus appeared that whileabout half of the transcribed mRNA's bound to ribosomes only at a locusprior to the initial methionine, the other half were bound at a locusprior to the Met⁴⁸ codon. In order to diminish the likelihood ofribosome binding internally within the polypeptide coding region,sections 33 and 34 of subunit IF-3 were reconstructed. Morespecifically, the GAG codon employed to specify a glutamate residue atposition 41 was replaced by the alternate, GAA, codon and the CGT codonemployed to specify arginine at position 45 was replaced by thealternate, CGC, codon. These changes, effected during construction ofthe gene specifying the γ-6 analog of IFN-γ, resulted in the expressionof a single predominant species of polypeptide of the appropriatelength.

The following examples 7 and 8 relate to procedures of the invention forgenerating a manufactured gene specifying the F subtype of humanleukocyte interferon (“LeuIFn—F” or “IFN-αF”) and polypeptide analogsthereof.

EXAMPLE 7

The amino acid sequences for the human leukocyte interferon of the Fsubtype has been deduced by way of sequencing of cDNA clones. See, e.g.,Goedell, et al., Nature, 200, pp. 20-26 (1981). The general proceduresof prior Examples 1, 2 and 3 were employed in the design and assembly ofa manufactured DNA sequence for use in microbial expression of IFN-αF inE. coli by means of a pBR322-derived expression vector. A general planfor the construction of three “major” subunit DNA sequences (LeuIFN—F I,LeuIFN—F II and LeuIFN—F III) and one “minor” subunit DNA sequence(LeuIFN—F IV) was evolved and is shown in Table VII below.

TABLE VII LeuIFN-F IV

LeuIFN-F III

LeuIFN-F II

LeuIFN-F I

As in the case of the gene manufacture strategy set out in Table IV, thestrategy of Table VII involves use of bacterial preference codonswherever it is not inconsistent with deoxyribonucleotide segmentconstructions. Construction of an expression vector with the subunitswas similar to that involved with the IFNγ-specifying gene, with minordifferences in restriction enzymes employed. Subunit I is ligated intopBR322 cut with EcoRI and SaII. (Note that the subunit terminal portionincludes a single stranded SaII “sticky end” but, upon complementation,a SaII recognition site is not reconstituted. A full BamBI recognitionsite remains, however, allowing for subsequent excision of the subunit.)This first intermediate plasmid is amplified and subunit II is insertedinto the amplified plasmid after again cutting with EcoRI and SaII. Thesecond intermediate plasmid thus formed is amplified and subunit III isinserted into the amplified plasmid cut with EcoRI and HindIII. Thethird intermediate plasmid thus formed is amplified. Subunit IV isligated to an EcoRI and XbaI fragment isolated from pINTγ-TXb4 ofExample 4 and this ligation product (having EcoRI and BstEII stickyends) is then inserted into the amplified third intermediate plasmid cutwith EcoRI and BstEII to yield the final expression vector.

The isolated product of trp promoter/operator controlled E. coliexpression of the manufactured DNA sequence of Table VII as insertedinto the final expression vector was designated IFN-αF₁.

EXAMPLE 8

As discussed infra with respect to consensus leukocyte interferon, thosehuman leukocyte interferon subtypes having a threonine residue atposition 14 and a methionine residue at position 16 are reputed todisplay greater antiviral activity than those subtypes possessing Ala¹⁴and IIe6 residues. An analog of human leukocyte interferon subtype F wastherefore manufactured by means of microbial expression of a DNAsequence of Example 7 which had been altered to specify threonine andmethionine as residues 14 and 16, respectively. More specifically, [Thr4, Met16] IFN-αF, designated IFN-αF₂, was expressed in E. coli upontransformation with a vector of Example 7 which had been cut with SaIIand HindIII and into which a modified subunit II (of Table VII) wasinserted. The specific modifications of subunit II involved assemblywith segment 39 altered to replace the alanine-specifying codon, GCT,with a threonine-specifying ACT codon and replace theisoleucine-specifying codon, ATT, with an ATG codon. Correspondingchanges in complementary bases were made in section 40 of subunitLeuFN-FII.

The following Examples 9 and 10 relate to practice of the invention inthe microbial synthesis of consensus human leukocyte interferonpolypeptides which can be designated as analogs of human leukocyteinterferon subtypes F.

EXAMPLE 9

“Consensus human leukocyte interferon” (“IFN-Con,” “LeuIFN-Con”) asemployed herein shall mean a non-naturally-occurring polypeptide whichpredominantly includes those amino acid residues which are common to allnaturally-occurring human leukocyte interferon subtype sequences andwhich includes, at one or more of those positions wherein there is noamino acid common to all subtypes, an amino acid which predominantlyoccurs at that position and in no event includes any amino acid residuewhich is not extant in that position in at least one naturally-occurringsubtype. (For purposes of this definition, subtype A is positionallyaligned with other subtypes and thus reveals a “missing” amino acid atposition 44.) As so defined, a consensus human leukocyte interferon willordinarily include all known common amino acid residues of all subtypes.It will be understood that the state of knowledge concerningnaturally-occurring subtype sequences is continuously developing. Newsubtypes may be discovered which may destroy the “commonality” of aparticular residue at a particular position. Polypeptides whosestructures are predicted on the basis of a later-amended determinationof commonality at one or more positions would remain within thedefinition because they would nonetheless predominantly include commonamino acids and because those amino acids no longer held to be commonwould nonetheless quite likely represent the predominant amino acids atthe given positions. Failure of a polypeptide to include either a commonor predominant amino acid at any given position would not remove themolecule from the definition so long as the residue at the positionoccurred in at least one subtype. Polypeptides lacking one or moreinternal or terminal residues of consensus human leukocyte interferon orincluding internal or terminal residues having no counterpart in anysubtype would be considered analogs of human consensus leukocyteinterferon.

Published predicted amino acid sequences for eight cDNA-derived humanleukocyte interferon subtypes were analyzed in the context of theidentities of amino acids within the sequence of 166 residues. See,generally, Goedell, et al., Nature, 290, pp. 20-26 (1981) comparingLeIFN-α through LeIFN—H and noting that only 79 amimo acids appear inidentical positions in all eight interferon forms and 99 amino acidsappear in identical positions if the E subtype (deduced from a cDNApseudogene) was ignored. Each of the remaining positions was analyzedfor the relative frequency of occurrence of a given amino acid and,where a given amino acid appeared at the same position in at least fiveof the eight forms, it was designated as the predominant amino acid forthat position. A “consensus” polypeptide sequence of 166 amino acids wasplotted out and compared back to the eight individual sequences,resulting in the determination that LeIFN—F required few modificationsfrom its “naturally-occurring” form to comply with the consensussequence.

A program for construction of a manufactured IFN-Con DNA sequence wasdeveloped and is set out below in Table VIII. In the table, an asteriskdesignates the variations in IFN-αF needed to develop LeIFN-Con₁, i.e.,to develop the (Arg²², Ala⁷⁶, Asp⁷⁸, Glu⁷⁹, Tyr⁸⁶, Tyr⁹⁰, Leu⁹⁶, Thr¹⁵⁶,Asn¹⁵⁷, Leu¹⁵⁸] analog of IFN-αF. The illustrated top strand sequenceincludes, wherever possible, codons noted to the subject of preferentialexpression in E. coli. The sequence also includes bases providingrecognition sites for Sal, BindIII, and BstE2 at positions intermediatethe sequence and for XBal and BamHI at its ends. The latter sites areselected for use in incorporation of the sequence in a pBR322 vector, aswas the case with the sequence developed for IFN-αF and its analogs.

TABLE VIII -1  1                                   10Met-Cys-Asp-Leu-Pro-Gln-Thr-His-Ser-Leu-Gly-Asn- ATG TGT GAT TTA CCT CAAACT CAT TCT CTT GGT AAC                                 20      *Arg-Arg-Ala-Leu-Ile-Leu-Leu-Ala-Gln-Met-Arg-Arg- CGT CGC GCT CTG ATT CTGCTG GCA CAG ATG CGT CGT                         30Ile-Ser-Pro-Phe-Ser-Cys-Leu-Lys-Asp-Arg-His-Asp- ATT TCC CCG TTT AGC TGCCTG AAA GAC CGT CAC GAC                 40Phe-Gly-Phe-Pro-Gln-Glu-Glu-Phe-Asp-Gly-Asn-Gln- TTC GGC TTT CCG CAA GAAGAG TTC GAT GGC AAC CAA         50Phe-Gln-Lys-Ala-Gln-Ala-Ile-Ser-Val-Leu-His-Glu- TTC CAG AAA GCT CAG GCAATC TCT GTA CTG CAC GAA 60                                      70Met-Ile-Gln-Gln-Thr-Phe-Asn-Leu-Phe-Ser-Thr-Lys- ATG ATC CAA CAG ACC TTCAAC CTG TTT TCC ACT AAA                 *       *   *   80Asp-Ser-Ser-Ala-Ala-Trp-Asp-Glu-Ser-Leu-Leu-Glu- GAC AGC TCT GCT GCT TGGGAC GAA AGC TTG CTG GAG         *               *90                     Lys-Phe-Tyr-Thr-Glu-Leu-Tyr-Gln-Gln-Leu-Asn-Asp- AAG TTC TAC ACT GAA CTGTAT CAG CAG CTG AAC GAC *               100Leu-Glu-Ala-Cys-Val-Ile-Gln-Glu-Val-Gly-Val-Glu- CTG GAA GCA TGC GTA ATCCAG GAA GTT GGT GTA GAA         110                                     Glu-Thr-Pro-Leu-Met-Asn-Val-Asp-Ser-Ile-Leu-Ala- GAG ACT CCG CTG ATG AACGTC GAC TCT ATT CTG GCA 120                                     130Val-Lys-Lys-Tyr-Phe-Gln-Arg-Ile-Thr-Leu-Tyr-Leu- GTT AAA AAG TAC TTC CAGCGT ATC ACT CTG TAC CTG                                 140Thr-Glu-Lys-Lys-Tyr-Ser-Pro-Cys-Ala-Trp-Glu-Val- ACC GAA AAG AAA TAT TCTCCG TGC GCT TGG GAA GTA                         150                     Val-Arg-Ala-Glu-Ile-Met-Arg-Ser-Phe-Ser-Leu-Ser- GTT CGC GCT GAA ATT ATGCGT TCT TTC TCT CTG TCT *   *   *       160                     166 StopThr-Asn-Leu-Gln-Glu-Arg-Leu-Arg-Arg-Lys-Glu ACT AAC CTG CAG GAG CGT CTGCGC CGT AAA GAA TAA Stop TAG

Table IX below sets out the specific double stranded DNA sequence forpreparation 4 subunit DNA sequences for use in manufacture of IFN-Con₁.Subunit LeuIFN-Con IV is a duplicate of LeuIFN—F IV of Table VIII.Segments of subunits which differ from those employed to construct theIFN-αF gene are designated with a “prime” (e.g., 37′ and 38′ are alteredforms of sections 37 and 38 needed to provide arginine rather thanglycine at position 22).

TABLE IX LeuIFN Con IV

LeuIFN Con III

LeuIFN Con II

LeuIFN Con I

The four subunits of Table IX were sequentially inserted into anexpression vector according to the procedure of Example 7 to yield avector having the coding region of Table VIII under control of a trppromoter/operator. The product of expression of this vector in E. coliwas designated IFN-Con₁. It will be noted that this polypeptide includesall common residues indicated in Goedall, et al., supra, and, with theexception of Ser⁸⁰, Glu⁸³ Val¹¹⁴; and Lys¹²¹, included the predominantamino acid indicated by analysis of the reference's summary ofsequences. The four above-noted residues were retained from the nativeIFN-αF sequence to facilitate construction of subunits and assembly ofsubunits into an expression vector. (Note, e.g., serine was retained atposition 80 to allow for construction of a HindIII site.)

Since publication of the Goedall, et al. summary of IFN-α subtypes, anumber of additional subtypes have been ascertained. FIG. 2 sets FIGS.2A-2C set out in tabular form the deduced sequences of the 13 presentlyknown subtypes (exclusive of those revealed by five known cDNApseudogenes) with designations of the same IFN-α subtypes from differentlaboratories indicated parenthetically (e.g., IFN-α6 and IFN-αK). See,e.g., Goedell, et al., supra; Stebbing, et al., in: Recombinant DNAProducts, Insulin, Interferons and Growth Hormones (A. Bollon, ed.), CRCPress (1983); and Weissman, et al., U.C.L.A. Symp. Mol. Cell. Biol., 25,pp.295-326 (1982). Positions where there is no common amino acid areshown in bold face. IFN-α subtypes are roughly grouped on the basis ofamino acid residues. In seven positions (14, 16, 71, 78, 79, 83, and160) the various subtypes show just two alternative amino acids,allowing classification of the subtypes into two subgroups (I and II)based on which of the seven positions are occupied by the same aminoacid residues. Three IFN-α subtypes (H, F, and B) cannot be classifiedas Group I or Group II and, in terms of distinguished positions, theyappear to be natural hybrids of both group subtypes. It has beenreported that IFN-α subtypes of the Group I type display relatively highantiviral activity while those of Group II display relatively highantitumor activity.

IFN-Con₁ structure is described in the final line FIGS. 2A-2C. It isnoteworthy that certain residues of IFN-Con₁ (e.g., serine at position8) which were determined to be “common” on the basis of the Goedell, etal., sequences are now seen to be “predominant.” Further, certain of theIFN-Con₁ residues determined to be predominant on the basis of thereference (Arg²², Asp⁷⁸, Glu⁷⁹, and Tyr⁸⁶) are no longer so on the basisof updated information, while certain heretofore nonpredominant others(Ser⁸⁰ and Glu⁸³) now can be determined to be predominant

EXAMPLE 10

A human consensus leukocyte interferon which differed from IFN-Con₁ interms of the identity of amino acid residues at positions 14 and 16 wasprepared by modification of the DNA sequence coding for IFN-Con₁. Morespecifically, the expression vector for IFN-Con₁ was treated with BstEIIand Hind III to delete subunit LeuIFN Con III. A modified subunit wasinserted wherein the alanine-specifying codon, GCT, of sections 39 and40 was altered to a threonine-specifying codon, ACT, and the isoleucinecodon, CTG, was changed to ATG. The product of expression of themodified manufactured gene, [Thr¹⁴ Met¹⁶, Arg²², Ala⁷⁶, Asp⁷⁸, Glu⁷⁹,Tyr⁸⁶, Tyr⁹⁰ Leu⁹⁶, Thr¹⁵⁶, Asn¹⁵⁷, Leu ¹⁵⁸]IFN-αF, was designatedIFN-Con₂.

Presently being constructed is a gene for a consensus human leukocyteinterferon polypeptide which will differ from IFN-Con₁ in terms of theidentity of residues at positions 114 and 121. More specifically, theVal¹¹⁴ and Lys¹²¹ residues which duplicate IFN-αF subtype residues butare not predominant amino acids will be charged to the predominantGlu¹¹⁴ and Arg¹²¹ residues, respectively. Because the codon change fromVal¹¹⁴ to Arg¹¹⁴ (e.g., GTC to GAA) will no longer allow for a SaII siteat the terminal portion of subunit LeuIFN Con I (of Table IX), subunitsI and II will likely need to be constructed as a single subunit.Changing the AAA, lysine, codon of sections 11 and 12 to CTG will allowfor the presence of arginine at position 121. The product of microbialexpression of the manufactured gene, [Arg²², Ala⁷⁶, Asp⁷⁸Glu⁷⁹,Tyr⁸⁶Tyr⁹⁰, Leu⁹⁶, Glu¹¹⁴, Arg¹²¹, Thr¹⁵⁶, Asn¹⁵⁷, Leu¹⁵⁸] IFN-αF, willbe designated IFN-Con₃.

The following example relates to procedures for enhancing levels ofexpression of exogenous genes in bacterial species, especially, E. coli.

EXAMPLE 11

In the course of development of expression vectors in the aboveexamples, the trp promoter/operator DNA sequence was employed whichincluded a ribosome binding site (“RBS”) sequence in a position justprior to the initial translation start (Met³¹ ¹, ATG). An attempt wasmade to increase levels of expression of the various exogenous genes inE. coli by incorporating DNA sequences duplicative of portions ofputative RBS sequences extant in genomic E. coli DNA sequencesassociated with highly expressed cellular proteins. Ribosome bindingsite sequences of such protein-coding genes as reported in Inokuchi, etal. Nuc.Acids.Res., 10, pp. 6957-6968 (1982), Gold, et al.,Ann.Rev.Microbiol., 35, pp. 365-403 (1981) and Alton, et al., Nature,282,pp. 864-869 (1979), were reviewed and the determination was made toemploy sequences partially duplicative of those associated with the E.coli proteins OMP-F (outer membrane protein F), CRO and CAM(chloramphenicol transacetylase).

By way of example, to duplicate a portion of the OMP-F RBS sequence thefollowing sequence is inserted prior to the Met⁻¹ codon.

5′-AACCATGAGGGTAATAAATA-3′ 3′-TTGGTACTCCCATTATTTAT-5′

In order to incorporate this sequence in a position prior to the proteincoding region of, e.g., the manufactured gene coding for IFN-Con₁ orIFN-αF₁, subunit IV of the expression vector was deleted (by cutting thevector with Xbal and BstEII) and replaced with a modified subunit IVinvolving altered sections 41A and 42A and the replacement of sections43 and 44 with new segments RB1 and RB2. The construction of themodified sequence is as set out in Table X, below.

TABLE X

Table XI, below, illustrates the entire DNA sequence in the regionpreceding the protein coding region of the reconstructed gene startingwith the Hpal site within the trp promoter/operator (compare subunitIF-4 of Table IV).

TABLE XI HpaI                                      XbaI AAC TAG TAC GCAAGT TCA CGT AAA AAG GGT ATC TAG AAA CCA TTG ATC ATG CGT TCA AGT GCA TTTTTC CCA TAG ATC TTT GGT                    −1   1   2   3   4   5   6   7   8   9    BstE II                    Met Cys Asp Leu Pro Gln Thr His Ser Leu TGA GGG TAATAA ATA ATG TGT GAT TTA CCT CAA ACT CAT TCT CTT G ACT CCC ATT ATT TATTAC ACA CTA AAT GGA GTT TGA GTA AGA GAA CATG

Similar procedures were followed to incorporate sequence duplicative ofRBS sequences of CRO and CAM genes, resulting in the following sequencesimmediately preceding the Met ¹ codon.

1       10         20 *        *         * CRO: GCATGTACTAAGGAGGTTGTCGTACATGATTCCTCCAACA 1       10         20 *        *         * CAN:CAGGAGCTAAGGAAGCTAAA GTCCTCGATTCCTTCGATTT

It will be noted that all the RBS sequence inserts possess substantialhomology to Shine-Delgarno sequences, are rich in adenine and includesequences ordinarily providing “stop” codons.

Levels of E. coli expression of IFN-Con₁ were determined usingtrp-controlled expression vectors incorporating the three PBS inserts(in addition to the RBS sequence extant in the complete trppromoter/operator). Expression of the desired polypeptide using theOMP-F RBS duplicating sequence was at from 150-300 mg per liter ofculture, representing from 10 to 20 percent of total protein. Vectorsincorporating the CAM RBS duplicating sequence provided levels ofexpression which were about one-half that provided by the OMP-F variant.Vectors including the CRO RBS duplicating sequence yielded the desiredprotein at levels of about one-tenth that of the OMP-F variant.

The following example relates to antiviral activity screening of humanleukocyte interferon and polypeptides provided by the precedingexamples.

EXAMPLE 12

Table XII below provides the results of, testing of antiviral activityin various cell lines of natural (buffy coat) interferon and isolated,microbiallyexpressed, polypeptides designated IFN-αF₁, IFN-αF₂,IFN-Con₁, and IFN-Con₂. Viruses used were VSV (vesicular stomatitisvirus) and EMCV (encephalomyocarditis virus). Cell lines were fromvarious mammalian sources, including human (WISH, HeLa), bovine (MDBK),mouse (MLV-6), and monkey (Vero). Antiviral activity was determined byan end-point cytopathic effect assay as described in Weck, et al., J.Gen. Virol., 57, pp. 233-237 (1981) and Campbell, et al., CanJ.Microbiol., 21, pp. 1247-1253 (1975). Data shown was normalized forantiviral activity in WISH cells.

TABLE XII Cell Buffy IFN- IFN- IFN- IFN- Virus Line Cost αF₁ αF₂ Con₁Con₂ VSV WISH 100 100 100 100 100 VSV HeLa 400 100  ND^(*) 200 100 VSVMDBK 1600 33 ND 200 300 VSV MLV-6 20 5 ND 3 20 VSV Vero 10 0.1 ND 10 0.1EMCV WISH 100 100 100 100 100 EMCV HeLa 100 5 ND 33 33 EMCV Vero 100 20ND 1000 10 ^(*)ND-no data presently available.

It will be apparent from the above examples that the present inventionprovides, for the first time, an entire new genus of synthesized,biologically active proteinaceous products which products differ fromnaturally-occurring forms in terms of the identity and/or location ofone or more amino acids and in terms of one or more biological (e.g.,antibody reactivity) and pharmacological (e.g., potency or duration ofeffect) but which substantially retain other such properties.

Products of the present invention and/or antibodies thereto may besuitably “tagged”, for example radiolabelled (e.g., with 1125)conjugated with enzymes or fluorescently labelled, to provide reagentmaterials useful in assays and/or diagnostic test kits, for thequalitative and/or quantitative determination of the presence of suchproducts and/or said antibodies in fluid samples. Such antibodies may beobtained from the innoculation of one or more animal species (e.g., micerabbit, goat, human, etc.) or from monoclonal antibody sources. Any ofsuch reagent materials may be used alone or in combination with asuitable substrate, e.g., coated on a glass or plastic particle bead.

Numerous modifications and variations in the practice of the inventionare expected to occur to those skilled in the art upon consideration ofthe foregoing illustrative examples. Consequently, the invention shouldbe considered as limited only to the extent reflected by the appendedclaims.

1. [Met-⁻¹, des-Cys¹, des-Tyr², des-Cys³] A Met⁻¹ , des-Cys ¹ , des-Tyr ² , des-Cys ³ analog polypeptide of human IFN-γ polypeptide produced encoded by a DNA sequence coding therefor in a transformant organism, said polypeptide having substantially the characteristics of human immune interferon. transformed host cell.
 2. A process for producing a Met⁻¹ , des-Cys ¹ , des-Tyr ² , des-Cys ³ analog polypeptide of human IFN-N-γ comprising the steps of (a) growing a host cell transformed with a DNA sequence encoding the analog polypeptide, whereby said host cell expresses said DNA sequences, and (b) isolating the polypeptide produced by step (a).
 3. The process of claim 2 wherein the host cell is E. coli. 